Theoretical considerations of potential-based reward shaping for multi-agent systems
نویسندگان
چکیده
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
منابع مشابه
A Theoretical and Empirical Analysis of Reward Transformations in Multi-Objective Stochastic Games
Reward shaping has been proposed as a means to address the credit assignment problem in Multi-Agent Systems (MAS). Two popular shaping methods are Potential-Based Reward Shaping and difference rewards, and both have been shown to improve learning speed and the quality of joint policies learned by agents in single-objective MAS. In this work we discuss the theoretical implications of applying th...
متن کاملPlan-based reward shaping for multi-agent reinforcement learning
Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learnin...
متن کاملPotential-based reward shaping for knowledge-based, multi-agent reinforcement learning
Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating doma...
متن کاملDynamic potential-based reward shaping
Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multiagent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a stat...
متن کاملPotential-based difference rewards for multiagent reinforcement learning
Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011